Gene expression profiles in stomach cancer

ABSTRACT

The present invention results from the examination of tissue from stomach tumors to identify genes that are differentially expressed between cancerous and normal tissue. The invention includes diagnostic, monitoring, drug design and therapeutic methods using these genes, as well as solid supports comprising oligonucleotide arrays that are complementary to or hybridize to the differentially expressed genes.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Applications60/341,816 and 60/343,191, both of which are herein incorporated byreference in their entirety.

FIELD OF THE INVENTION

The invention relates generally to the changes in gene expression instomach tissue from patients with gastric cancer. The inventionspecifically relates to a set of human genes that are differentiallyexpressed in cancerous stomach tissue compared to non-cancerous stomachtissue.

BACKGROUND OF THE INVENTION

Stomach Cancer

In the United States, approximately 24,000 new cases of stomach cancer,or gastric cancer, are diagnosed every year. Although the incidence ofstomach cancer has declined significantly in the last 60 years, it isstill a serious disease caused by factors that remain elusive. Undersimilar circumstances, some people develop stomach cancer and others donot.

Stomach cancer usually occurs in people over the age of 55 and is twiceas common in men as in women. This type of cancer is not one of themajor ones in the United States, but it is much more prevalent in Japan,Korea, Latin America and parts of Eastern Europe, where people eat morefoods that are preserved by drying, pickling, smoking or salting.Conversely, consuming fresh fruits and vegetables may protect againstthis disease.

Stomach cancer can develop in any part of the stomach and spreadthroughout the stomach and/or to other organs. The cancer may also growalong the stomach wall and spread to the esophagus or small intestine.If the cancer grows through the stomach wall, it can extend to nearbylymph nodes, the liver and the pancreas and the colon. Stomach cancercan spread even farther, to the ovaries, lungs and distant lymph nodes.When stomach cancer metastasizes to another part of the body, thesetumor cells are of the same type as those in the original tumor. Inother words, metastasized cells in the liver are still stomach tumorcells. Such tumor cells that spread to an ovary, establishing one ormore ovarian tumors, are known as Krukenberg tumors and are composed oftransformed stomach cells, not ovarian cells.

Because the symptoms of stomach cancer are non-specific, this cancer isdifficult to detect in its early stages. Symptoms include indigestion,heartburn, abdominal pain, nausea and vomiting, diarrhea orconstipation, loss of appetite, weakness and fatigue, and bleeding whichis detected by blood in the stool or by the affected person vomitingblood. Diagnosis is usually performed by x-rays of the uppergastrointestinal tract and esophagus, the x-rays taken after the patienthas consumed a liquid barium tracer. Endoscopy of the stomach andesophagus, with a gastroscope, can also be performed. If abnormal tissueis found, it can be biopsied through the gastroscope. Should the biopsyspecimen show cancerous cells, surrounding lymph nodes are thenbiopsied, and surrounding organs, such as the liver and pancreas, areexamined via CT scan to determine the extent or stage of the disease.Treatment methods for stomach cancer are similar to those employed inother types of cancer-removal of the affected organ (partial or totalgastrectomy), possibly with removal of nearby lymph nodes as well,chemotherapy, radiation therapy and immunotherapy (stimulating immunesystem components that attack cancer cells)(http://cancemet.nci.nih.gov/cancertypes.html). As early stomach cancercauses few symptoms, diagnosis is not usually made before the advancedstages of the disease, where treatments are less effective.

Molecular Changes in Stomach Cancer

Little is known about the molecular changes in stomach cells associatedwith the development and progression of stomach cancer. Accordingly,there exists a need for the investigation of the changes in global geneexpression levels, as well as the need for the identification of newmolecular markers associated with the development and progression ofstomach cancer. Furthermore, if intervention is expected to besuccessful in halting or slowing the progression of stomach cancer,means of accurately assessing the early manifestations of this diseaseneed to be established. One way to accurately assess the earlymanifestations of stomach cancer is to identify markers which areuniquely associated with disease progression (see for example Kim etal., Oncogene 20:4568-4575, 2001). Likewise, the development oftherapeutics to prevent or stop the progression of stomach cancer relieson the identification of genes responsible for cancerous transformationand growth in the stomach.

To date, researchers have been able to identify a few geneticalterations believed to underlie tumor development. These geneticalterations include amplification of oncogenes and mutations that resultin the loss of tumor suppressor genes. Tumor suppressor genes are genesthat, in their wild-type alleles, express proteins that suppressabnormal cellular proliferation. When the gene coding for a tumorsuppressor protein is mutated or deleted, the resulting mutant proteinor the complete lack of tumor suppressor protein expression may fail tocorrectly regulate cellular proliferation, and abnormal proliferationmay take place, particularly if there is already existing damage to thecellular regulatory mechanism. A number of well-studied human tumors andtumor cell lines have missing or non-functional tumor suppressor genes.Examples of tumor suppressor genes include, but are not limited to, theretinoblastoma susceptibility gene or RB gene, the p53 gene, thedeletion in colon carcinoma (DCC) gene and the neurofibromatosis type 1(NF-1) tumor suppressor gene (Weinberg, Science 254:1138-1146, 1991).Loss of function or inactivation of tumor suppressor genes may play acentral role in the initiation and/or progression of a significantnumber of human cancers.

Classification of heterogeneous populations of tumor types is a dauntingtask; yet, initial studies utilizing gene expression patterns toidentify subtypes of cancer produced rather intriguing results (seePerou et al., Proc Natl Acad Sci USA 96:9212-9217, 1999; Golub et al.,Science 286:531-537, 1999; Alizadeh et al., Nature 403:503-511, 2000;Alon et al. Proc Natl Acad Sci USA 96:6745-6750, 1999; and Bittner etal., Nature 406:536-540, 2000). Molecular classification of B-celllymphoma by gene expression profiling elucidated clinically distinctdiffuse large-B-cell lymphoma subgroups (see Alizadeh supra).Stratification of patients based on their distinctive gene expressionprofiles may allow researchers to precisely group similar patientpopulations for evaluating chemotherapeutic agents. The more homogenouspopulation of patients decreases the variability of patient-to-patientresponses leading to the development of agents capable of eradicatingspecific subtypes of cancers previously unknown using standardclassification techniques.

The utilization of gene expression profiles to classify tumors, toidentify drug targets, to identify diagnostic markers and/or to gainfurther insights into the consequences of chemotherapeutic treatmentscould facilitate the design of more efficacious patient-specificstratagems for treating a variety of cancers. In breast cancer, studiesutilizing limited numbers of genes (8,102 genes) have classified tumorsinto subtypes based on gene expression profiles, and this studyindicated a diversity of molecular phenotypes associated with breasttumors (Perou et al., Nature 406:747-752, 2000). The advent of cDNA andoligonucleotide arrays has enabled researchers to map tissue-specificexpression levels for thousands of genes (Alon et al., Proc Natl AcadSci USA 96:6745-6750, 1999; Iyer et al., Science 283:83-87, 1999; Khanet al., Cancer Res 58:5009-5013, 1998; Lee et al., Science285:1390-1393, 1999; Wang et al., Gene 229:101-108, 1999; Whitney etal., Ann Neurol 46:425-428, 1999). The study by Martin et al. (CancerRes 60:2232-2238, 2000) used a custom microarray composed of 124 genesdiscovered by differential display associated with either normal breastepithelial cells or from the MDA-MB-435 malignant breast tumor cellline. Using the custom microarray, researchers examined the relationshipbetween expression patterns discovered by clustering a number of geneswith clinical stages of breast cancer indicating that gene expressionpatterns were capable of grouping breast tumors into distinct categories(Martin et al., supra).

Although these studies have demonstrated that expression profiling maybe used to produce improvements in diagnosis of human diseases such ascancer, as well as in the development of improved therapeuticstrategies, further studies are needed. Accordingly, there remains aneed in the art for materials and methods that permit a more accuratediagnosis of stomach cancer. In addition, there remains a need in theart for methods to treat and methods to identify agents that caneffectively treat this disease. The present invention meets these andother needs.

SUMMARY OF THE INVENTION

The present invention is based on the discovery of the genes and theirexpression profiles associated with various types and stages of stomachcancer.

The invention includes methods of diagnosing stomach cancer in apatient, comprising the step of detecting the level of expression in atissue sample of one or more genes from Table 1; wherein differentialexpression of the genes in Table 1 is indicative of stomach cancer. Theinvention also includes methods of detecting the progression of stomachcancer. For instance, methods of the invention include detecting theprogression of stomach cancer in a patient, comprising the step ofdetecting the level of expression in a tissue sample of one or moregenes from Table 1; wherein differential expression of the genes inTable 1 is indicative of stomach cancer progression. In some preferredembodiments, PCA analysis based on all or a portion of the group ofgenes identified in Table 1 may be used to differentiate between thedifferent stages of stomach cancer, such as in the metastasis of thedisease to healthy regions of the stomach and to other organs. In somepreferred embodiments, one or more genes may be selected from Table 1.

In some aspects, the present invention provides a method of monitoringthe treatment of a patient with stomach cancer, comprising administeringa pharmaceutical composition to the patient, preparing a gene expressionprofile from a cell or tissue sample from the patient and comparing thepatient gene expression profile to a gene expression from a cellpopulation comprising normal stomach cells, or to a gene expressionprofile from a cell population comprising diseased stomach cells, or toboth. In some preferred embodiments, the gene profile will include theexpression level of one or more genes in Table 1.

Another aspect of the present invention includes a method of treating apatient with stomach cancer, comprising administering to the patient apharmaceutical composition, wherein the composition alters theexpression of at least one gene in Table 1, preparing a gene expressionprofile from a cell or tissue sample from the patient comprisingdiseased cells and comparing the patient expression profile to a geneexpression profile from an untreated cell population comprising stomachtumor cells.

In another aspect, the present invention provides a method of detectingthe progression of carcinogenesis in a patient, comprising detecting thelevel of expression in a tissue sample of one or more genes from Table1; wherein differential expression of the genes in Table 1 is indicativeof stomach carcinogenesis.

The invention further includes methods of screening for an agent capableof modulating the onset or progression of stomach cancer, comprising thesteps of exposing a cell to the agent; and detecting the expressionlevel of one or more genes from Table 1. In some preferred embodiments,one or more genes may be selected from a group consisting of thoselisted in Table 1. In some preferred methods, it may be desirable todetect all or nearly all of the genes in the table.

The invention further includes compositions comprising at least twooligonucleotides, wherein each of the oligonucleotides comprises asequence that specifically hybridizes to a gene in Table 1, as well assolid supports comprising at least two probes, wherein each of theprobes comprises a sequence that specifically hybridizes to a gene inTable 1. In some preferred embodiments, one or more genes may beselected from a group consisting of those listed in Table 1.

The invention further includes computer systems comprising a databasecontaining information identifying the expression level in stomachtissue of a set of genes comprising at least two genes in Table 1 and auser interface to view the information. In some preferred embodiments,one or more genes may be selected from a group consisting of thoselisted in Table 1. The database may further include sequence informationfor the genes, information identifying the expression level for the setof genes in normal stomach tissue and in cancerous stomach tissue andmay contain links to external databases such as GenBank.

Lastly, the invention includes methods of using the databases, such asmethods of using the disclosed computer systems to present informationidentifying the expression level in a tissue or cell of at least onegene in Table. 1, comprising the step of comparing the expression levelof at least one gene in Table 1 in the tissue or cell to the level ofexpression of the gene in the database. In some preferred embodiments,one or more genes may be selected from a group consisting of thoselisted in Table 1.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Many biological functions are accomplished by altering the expression ofvarious genes through transcriptional (e.g., through control ofinitiation, provision of RNA precursors, RNA processing, etc.) and/ortranslational control. For example, fundamental biological processessuch as cell cycle, cell differentiation and cell death, are oftencharacterized by the variations in the expression levels of groups ofgenes.

Changes in gene expression also are associated with pathogenesis. Forexample, the lack of sufficient expression of functional tumorsuppressor genes and/or the over expression of oncogene/protooncogenescould lead to tumorgenesis or hyperplastic growth of cells (Marshall,Cell 64:313-326, 1991; Weinberg, Science, 254:1138-1146, 1991). Thus,changes in the expression levels of particular genes (e.g., oncogenes ortumor suppressors) serve as signposts for the presence and progressionof various diseases.

Monitoring changes in gene expression may also provide certainadvantages during drug screening and development. Often drugs arepre-screened for the ability to interact with a major target withoutregard to other effects the drugs have on cells. Often such othereffects cause toxicity in the whole animal, which prevent thedevelopment and use of the potential drug.

Applicants have examined samples from normal stomach tissue and fromcancerous stomach tissue to identify global changes in gene expressionbetween tumor biopsies and normal tissue. These global changes in geneexpression, also referred to as expression profiles, provide usefulmarkers for diagnostic uses as well as markers that can be used tomonitor disease states, disease progression, drug toxicity, drugefficacy and drug metabolism.

The gene expression profiles described herein were derived from normaland disease state stomach samples from five Korean patients between theages of 47 and 68. The disease state associated with each sample isindicated in Table 2.

The present invention provides compositions and methods to detect thelevel of expression of genes that may be differentially expresseddependent upon the state of the cell, i.e., normal versus cancerous.These expression profiles of genes provide molecular tools forevaluating toxicity, drug efficacy, drug metabolism, development, anddisease monitoring. Changes in the expression profile from a baselineprofile can be used as an indication of such effects. Those skilled inthe art can use any of a variety of known techniques to evaluate theexpression of one or more of the genes and/or gene fragments identifiedin the instant application in order to observe changes in the expressionprofile in a tissue or sample of interest.

Definitions

In the description that follows, numerous terms and phrases known tothose skilled in the art are used. In the interest of clarity andconsistency of interpretation, the definitions of certain terms andphrases are provided.

As used herein, the phrase “detecting the level of expression” includesmethods that quantify expression levels as well as methods thatdetermine whether a gene of interest is expressed at all. Thus, an assaywhich provides a yes or no result without necessarily providingquantification of an amount of expression is an assay that requires“detecting the level of expression” as that phrase is used herein.

As used herein, oligonucleotide sequences that are complementary to oneor more of the genes described herein, refers to oligonucleotides thatare capable of hybridizing under stringent conditions to at least partof the nucleotide sequence of said genes. Such hybridizableoligonucleotides will typically exhibit at least about 75% sequenceidentity at the nucleotide level to said genes, preferably about 80% or85% sequence identity or more preferably about 90% or 95% or morenucleotide sequence identity to said genes.

“Bind(s) substantially” refers to complementary hybridization between aprobe nucleic acid and a target nucleic acid and embraces minormismatches that can be accommodated by reducing the stringency of thehybridization media to achieve the desired detection of the targetpolynucleotide sequence.

The terms “background” or “background signal intensity” refer tohybridization signals resulting from non-specific binding, or otherinteractions, between the labeled target nucleic acids and components ofthe oligonucleotide array (e.g., the oligonucleotide probes, controlprobes, the array substrate, etc.). Background signals may also beproduced by intrinsic fluorescence of the array components themselves. Asingle background signal can be calculated for the entire array, or adifferent background signal may be calculated for each target nucleicacid. In a preferred embodiment, background is calculated as the averagehybridization signal intensity for the lowest 5% to 10% of the probes inthe array, or, where a different background signal is calculated foreach target gene, for the lowest 5% to 10% of the probes for each gene.Of course, one of skill in the art will appreciate that where the probesto a particular gene hybridize well and thus appear to be specificallybinding to a target sequence, they should not be used in a backgroundsignal calculation. Alternatively, background may be calculated as theaverage hybridization signal intensity produced by hybridization toprobes that are not complementary to any sequence found in the sample(e.g., probes directed to nucleic acids of the opposite sense or togenes not found in the sample such as bacterial genes where the sampleis mammalian nucleic acids). Background can also be calculated as theaverage signal intensity produced by regions of the array that lack anyprobes at all.

The phrase “hybridizing specifically to” refers to the binding,duplexing or hybridizing of a molecule substantially to or only to aparticular nucleotide sequence or sequences under stringent conditionswhen that sequence is present in a complex mixture (e.g., totalcellular) DNA or RNA.

Assays and methods of the invention may utilize available formats tosimultaneously screen at least about 100, preferably about 1000, morepreferably about 10,000 and most preferably about 1,000,000 or moredifferent nucleic acid hybridizations.

The terms “mismatch control” or “mismatch probe” refer to a probe whosesequence is deliberately selected not to be perfectly complementary to aparticular target sequence. For each mismatch (MM) control in ahigh-density array there typically exists a corresponding perfect match(PM) probe that is perfectly complementary to the same particular targetsequence. The mismatch may comprise one or more bases that are notcomplementary to the corresponding bases of the target sequence.

While the mismatch(s) may be located anywhere in the mismatch probe,terminal mismatches are less desirable as a terminal mismatch is lesslikely to prevent hybridization of the target sequence. In aparticularly preferred embodiment, the mismatch is located at or nearthe center of the probe such that the mismatch is most likely todestabilize the duplex with the target sequence under the testhybridization conditions.

The term “perfect match probe” refers to a probe that has a sequencethat is perfectly complementary to a particular target sequence. Thetest probe is typically perfectly complementary to a portion(subsequence) of the target sequence. The perfect match (PM) probe canbe a “test probe”, a “normalization control” probe, an expression levelcontrol probe and the like. A perfect match control or perfect matchprobe is, however, distinguished from a “mismatch control” or “mismatchprobe.”

As used herein a “probe” is defined as a nucleic acid, preferably anoligonucleotide, capable of binding to a target nucleic acid ofcomplementary sequence through one or more types of chemical bonds,usually through complementary base pairing, usually through hydrogenbond formation. As used herein, a probe may include natural (i.e., A, G,U, C or T) or modified bases (7-deazaguanosine, inosine, etc.). Inaddition, the bases in probes may be joined by a linkage other than aphosphodiesteir bond, so long as it does not interfere withhybridization. Thus, probes may be peptide nucleic acids in which theconstituent bases are joined by peptide bonds rather than phosphodiesterlinkages.

The term “stringent conditions” refers to conditions under which a probewill hybridize to its target subsequence, but with only insubstantialhybridization to other sequences or to other sequences such that thedifference may be identified. Stringent conditions aresequence-dependent and will be different in different circumstances.Longer sequences hybridize specifically at higher temperatures.Generally, stringent conditions are selected to be about 5° C. lowerthan the thermal melting point (Tm) for the specific sequence at adefined ionic strength and pH.

Typically, stringent conditions will be those in which the saltconcentration is at least about 0.01 to 1.0 M sodium ion concentration(or other salts) at pH 7.0 to 8.3 and the temperature is at least about30° C. for short probes (e.g., 10 to 50 nucleotide). Stringentconditions may also be achieved with the addition of destabilizingagents such as formamide.

The “percentage of sequence identity” or “sequence identity” isdetermined by comparing two optimally aligned sequences or subsequencesover a comparison window or span, wherein the portion of thepolynucleotide sequence in the comparison window may optionally compriseadditions or deletions (i.e., gaps) as compared to the referencesequence (which does not comprise additions or deletions) for optimalalignment of the two sequences. The percentage is calculated bydetermining the number of positions at which the identical subunit(e.g., nucleic acid base or amino acid residue) occurs in both sequencesto yield the number of matched positions, dividing the number of matchedpositions by the total number of positions in the window of comparisonand multiplying the result by 100 to yield the percentage of sequenceidentity. Percentage sequence identity when calculated using theprograms GAP or BESTFIT (see below) is calculated using default gapweights.

Homology or identity may be determined by BLAST (Basic Local AlignmentSearch Tool) analysis using the algorithm employed by the programsblastp, blastn, blastx, tblastn and tblastx (Karlin et al., Proc NatlAcad Sci USA 87:2264-2268, 1990 and Altschul, J Mol Evol 36:290-300,1993, fully incorporated by reference) which are tailored for sequencesimilarity searching. The approach used by the BLAST program is to firstconsider similar segments between a query sequence and a databasesequence, then to evaluate the statistical significance of all matchesthat are identified and finally to summarize only those matches whichsatisfy a preselected threshold of significance. For a discussion ofbasic issues in similarity searching of sequence databases, see Altschulet al., (Nature Genet 6:119-129, 1994) which is fully incorporated byreference. The search parameters for histogram, descriptions,alignments, expect (i.e., the statistical significance threshold forreporting matches against database sequences), cutoff, matrix and filterare at the default settings. The default scoring matrix used by blastp,blastx, tblastn, and tblastx is the BLOSUM62 matrix (Henikoff et al.,Proc Natl Acad Sci USA 89:10915-10919, 1992, fully incorporated byreference). Four blastn parameters were adjusted as follows: Q=10 (gapcreation penalty); R=10 (gap extension penalty); wink=1 (generates wordhits at every wink^(th) position along the query); and gapw=16 (sets thewindow width within which gapped alignments are generated). Theequivalent Blastp parameter settings were Q=9; R=2; wink=1; and gapw=32.A Bestfit comparison between sequences, available in the GCG packageversion 10.0, uses DNA parameters GAP=50 (gap creation penalty) andLEN=3 (gap extension penalty) and the equivalent settings in proteincomparisons are GAP=8 and LEN=2.

Uses of Differentially Expressed Genes

The present invention identifies those genes that are differentiallyexpressed between normal stomach tissue and cancerous stomach tissue.One of skill in the art can select one or more of the genes identifiedas being differentially expressed in Table 1 and use the information andmethods provided herein to interrogate or test a particular sample. Fora particular interrogation of two conditions or sources, it may bedesirable to select those genes which display a great deal of differencein the expression pattern between the two conditions or sources. Inother instances, it may be appropriate to select genes whose expressionchanges only slightly between the two conditions. A difference of atleast about two-fold may be desirable, but a three-fold, five-fold orten-fold difference may be preferred in some instances. Interrogationsof the genes or proteins can be performed to yield differentinformation.

Diagnostic Uses for the Stomach Cancer Markers

As described herein, the genes and gene expression information providedin Table 1 may be used as diagnostic markers for the prediction oridentification of a disease state of stomach tissue. For instance, astomach tissue sample or other sample from a patient may be assayed byany of the methods known to those skilled in the art, and the expressionlevels from one or more genes from Table 1 may be compared to theexpression levels found in normal stomach tissue, cancerous stomachtissue or both. Expression profiles generated from the tissue or othersamples that substantially resemble an expression profile from normal ordiseased stomach tissue may be used, for instance, to aid in diseasediagnosis. Comparison of the expression data, as well as availablesequence or other information may be done by researcher or diagnosticianor may be done with the aid of a computer and databases as describedherein.

Use of the Stomach Cancer Markers for Monitoring Disease Progression

Molecular expression markers for stomach cancer can be used to confirmthe type and progression of disease made on the basis of morphologicalcriteria. For example, normal stomach tissue can be distinguished fromcancerous stomach tissue based on the level and type of genes expressedin a tissue sample. In some situations, identifications of cell type orsource is ambiguous based on classical criteria. In these situations,the molecular expression markers of the present invention are useful foridentifying the region of the stomach from which a sample came, as wellas whether or not normal levels of gene expression have been altered(signs of metabolic disturbances).

In addition, progression of the carcinoma to new areas of the stomach orto other organs can be monitored by following the expression patterns ofthe involved genes using the molecular expression markers of the presentinvention. Monitoring of the efficacy of certain drug regimens can alsobe accomplished by following the expression patterns of the molecularexpression markers.

As described above, the genes and gene expression information providedin Table 1 may also be used as markers for the direct monitoring ofdisease progression, for instance, the development of stomach cancer. Astomach tissue sample or other sample from a patient may be assayed byany of the methods known to those of skill in the art, and theexpression levels in the sample from a gene or genes from Table 1 may becompared to the expression levels found in normal stomach tissue, tissuefrom a gastric carcinoma or both. Comparison of the expression data, aswell as available sequence or other information may be done by aresearcher or diagnostician or may be done with the aid of a computerand databases as described herein.

Use of the Stomach Cancer Markers for Drug Screening

According to the present invention, potential drugs can be screened todetermine if application of the drug alters the expression of one ormore of the genes identified herein. This may be useful, for example, indetermining whether a particular drug is effective in treating aparticular patient with stomach cancer. In the case where a gene'sexpression is affected by the potential drug such that its level ofexpression returns to normal, the drug is indicated in the treatment ofstomach cancer. Similarly, a drug which causes expression of a genewhich is not normally expressed by healthy stomach cells may becontra-indicated in the treatment of stomach cancer.

According to the present invention, the genes identified in Table 1 mayalso be used as markers to evaluate the effects of a candidate drug oragent on a cell, particularly a cell undergoing malignanttransformation, for instance, a stomach cancer cell or tissue sample. Acandidate drug or agent can be screened for the ability to stimulate thetranscription or expression of a given marker or markers (drug targets)or to down-regulate or inhibit the transcription or expression of amarker or markers. According to the present invention, one can alsocompare the specificity of a drug's effects by looking at the number ofmarkers affected by the drug and comparing them to the number of markersaffected by a different drug. A more specific drug will affect fewertranscriptional targets. Similar sets of markers identified for twodrugs indicates a similarity of effects.

Assays to monitor the expression of a marker or markers as defined inTable 1 may utilize any available means of monitoring for changes in theexpression level of the nucleic acids of the invention. As used herein,an agent is said to modulate the expression of a nucleic acid of theinvention if it is capable of up- or down-regulating expression of thenucleic acid in a cell.

Agents that are assayed in the above methods can be randomly selected orrationally selected or designed. As used herein, an agent is said to berandomly selected when the agent is chosen randomly without consideringthe specific sequences involved in the association of a protein of theinvention alone or with its associated substrates, binding partners,etc. An example of randomly selected agents is the use a chemicallibrary or a peptide combinatorial library, or a growth broth of anorganism.

As used herein, an agent is said to be rationally selected or designedwhen the agent is chosen on a nonrandom basis which takes into accountthe sequence of the target site and/or its conformation in connectionwith the agents action. Agents can be selected or designed by utilizingthe peptide sequences that make up these sites. For example, arationally selected peptide agent can be a peptide whose amino acidsequence is identical to or a derivative of any functional consensussite.

The agents of the present invention can be, as examples, peptides, smallchemical molecules, vitamin derivatives, as well as carbohydrates,lipids, oligonucleotides and covalent and non-covalent combinationsthereof. Dominant negative proteins, DNA encoding these proteins,antibodies to these proteins, peptide fragments of these proteins ormimics of these proteins may be introduced into cells to affectfunction. “Mimic” as used herein refers to the modification of a regionor several regions of a peptide molecule to provide a structurechemically different from the parent peptide but topographically andfunctionally similar to the parent peptide (see Grant, in MolecularBiology and Biotechnology, Meyers (ed.), VCH Publishers, 1995). Askilled artisan can readily S recognize that there is no limit as to thestructural nature of the agents of the present invention.

Use of the Stomach Cancer Markers as Therapeutic Agents

Agents that up- or down-regulate or modulate the expression of thenucleic acid molecules of Table 1, or at least one activity of a proteinencoded by the nucleic acid molecules of Table 1, such as agonists orantagonists, may be used to modulate biological and pathologic processesassociated with the function and activity of the proteins encoded bythese nucleic acid molecules. The agents can be the nucleic acidmolecules of Table 1 themselves, the encoded proteins, or portions ofthese molecules, such as all or part of the open reading frames of thesenucleic acid molecules.

Anti-sense oligonucleotide molecules derived from the nucleic acidsequences of Table 1 may also be used to down-regulate the expression ofone or more of the genes in Table 1 that are expressed at elevatedlevels in stomach cancer, the use of antisense gene therapy being anexample. Down-regulation of expression of one or more of the genes ofTable 1 is accomplished by administering an effective amount ofantisense oligonucleotides. These antisense molecules can be fashionedfrom the DNA sequences of these genes or sequences containing variousmutations, deletions, insertions or spliced variants. Isolated RNA orDNA sequences derived from these genes may also be used therapeuticallyin gene therapy. These agents may be used to induce gene expression instomach cancers associated with an absence of or considerably decreasedexpression of one or more of the proteins encoded by genes in Table 1.

As used herein, a subject can be any mammal, so long as the mammal is inneed of modulation of a pathological or biological process mediated by agene of the invention. The term “mammal” is defined as an individualbelonging to the class Mammalia. The invention is particularly useful inthe treatment of human subjects.

Pathological processes refer to a category of biological processes whichproduce a deleterious effect. For example, expression of a gene of theinvention may be associated with hyperplasia in the stomach, inparticular malignant hyperplasia. As used herein, an agent is said tomodulate a pathological process when the agent reduces the degree orseverity of the process. For instance, stomach cancer may be preventedor disease progression modulated by the administration of agents whichup- or down-regulate or modulate in some way the expression or at leastone activity of a gene of the invention.

The agents of the present invention can be provided alone, or incombination with other agents that modulate a particular pathologicalprocess. For example, an agent of the present invention can beadministered in combination with other known drugs. As used herein, twoagents are said to be administered in combination when the two agentsare administered simultaneously or are administered independently in afashion such that the agents will act at the same time.

The agents of the present invention can be administered via parenteral,subcutaneous, intravenous, intramuscular, intraperitoneal, transdermal,or buccal routes. Alternatively, or concurrently, administration may beby the oral route. The dosage administered will be dependent upon theage, health, and weight of the recipient, kind of concurrent treatment,if any, frequency of treatment, and the nature of the effect desired.

The present invention further provides compositions containing one ormore agents which modulate expression or at least one activity of aprotein of the invention. While individual needs vary, determination ofoptimal ranges of effective amounts of each component is within theskill of the art. Typical dosages comprise 0.1 to 100 μg/kg body wt. Thepreferred dosages comprise 0.1 to 10 μg/kg body wt. The most preferreddosages comprise 0.1 to 1 μg/kg body wt.

In addition to the pharmacologically active agent, the compositions ofthe present invention may contain suitable pharmaceutically acceptablecarriers comprising excipients and auxiliaries which facilitateprocessing of the active compounds into preparations which can be usedpharmaceutically for delivery to the site of action. Suitableformulations for parenteral administration include aqueous solutions ofthe active compounds in water-soluble form, for example, water-solublesalts. In addition, suspensions of the active compounds as appropriateoily-injection suspensions may be administered. Suitable lipophilicsolvents or vehicles include fatty oils, e.g., sesame oil, or syntheticfatty acid esters, e.g., ethyl oleate or triglycerides. Aqueousinjection suspensions may contain substances which increase theviscosity of the suspension include, for example, sodium carboxymethylcellulose, sorbitol, and/or dextran. Optionally, the suspension may alsocontain stabilizers. Liposomes can also be used to encapsulate the agentfor delivery into the cell.

The pharmaceutical formulation for systemic administration according tothe invention may be formulated for enteral, parenteral or topicaladministration. Indeed, all three types of formulations may be usedsimultaneously to achieve systemic administration of the activeingredient.

Suitable formulations for oral administration include hard or softgelatin capsules, pills, tablets, including coated tablets, elixirs,suspensions, syrups or inhalations and controlled release forms thereof.

In practicing the methods of this invention, the compounds of thisinvention may be used alone or in combination, or in combination withother therapeutic or diagnostic agents. In certain preferredembodiments, the compounds of this invention may be coadministered alongwith other compounds typically prescribed for these conditions accordingto generally accepted medical practice. The compounds of this inventioncan be utilized in vivo, ordinarily in mammals, such as humans, rats,mice, dogs, cats, sheep, horses, cattle and pigs, or in vitro.

Assay Formats

The genes identified as being differentially expressed in stomach cancermay be used in a variety of nucleic acid detection assays to detect orquantify the expression level of a gene or multiple genes in a givensample. For example, traditional Northern blotting, nuclease protection,RT-PCR and differential display methods may be used for detecting geneexpression levels. In methods where small numbers of genes are assayed,such as 5-50 genes, high-throughput PCR may be used.

The protein products of the genes identified herein can also be assayedto determine the amount of expression. Methods for assaying for aprotein include Western blot, immunoprecipitation and radioimmunoassay.In some methods, it is preferable to assay the mRNA as an indication ofexpression. Methods for assaying for mRNA include Northern blots, slotblots, dot blots, and hybridization to an ordered array ofoligonucleotides. Any method for specifically and quantitativelymeasuring a specific protein or mRNA or DNA product can be used.However, methods and assays of the invention are most efficientlydesigned with array or chip hybridization-based methods for detectingthe expression of a large number of genes.

Any hybridization assay format may be used, including solution-based andsolid support-based assay formats. A preferred solid support is a highdensity array also known as a DNA chip or a gene chip. One variation ofthe DNA chip contains hundreds of thousands of discrete microscopicchannels that pass completely through it. Probe molecules are attachedto the inner surface of these channels, and molecules from the samplesto be tested flow throughout the channels, coming into close proximitywith the probes for hybridization. In one assay format, gene chipscontaining probes to at least two genes from Table 1 may be used todirectly monitor or detect changes in gene expression in the treated orexposed cell as described herein.

The genes of the present invention may be assayed in any convenientsample form. For example, samples may be assayed in the form of mRNA orreverse transcribed mRNA. Samples may be cloned or not, and the samplesor individual genes may be amplified or not. The cloning itself does notappear to bias the representation of genes within a population. However,it may be preferable to use polyA+RNA as a source, as it can be usedwith less processing steps. In some embodiments, it may be preferable toassay the protein or peptide expressed by the gene.

The sequences of the expression marker genes of Table 1 are available inthe public databases. Table 1 provides the Accession number, SequenceNumber ID and name for each of the sequences. The sequences of the genesin GenBank are herein expressly incorporated by reference in theirentirety (see www.ncbi.nim.nih.gov).

Additional assay formats may be used to monitor the ability of the agentto modulate the expression of a gene identified in Table 1. Forinstance, as described above, mRNA expression may be monitored directlyby hybridization of probes to the nucleic acids of the invention. Celllines are exposed to an agent to be tested under appropriate conditionsand time and total RNA or mRNA is isolated by standard procedures suchthose disclosed in Sambrook et al., Molecular Cloning—A LaboratoryManual, Third ed., Cold Spring Harbor Laboratory Press, Cold SpringHarbor, New York, 2001. In some embodiments, it may be desirable toamplify one or more of the RNA molecules isolated prior to applicationof the RNA to the gene chip. Using techniques well known in the art, theRNA may be reverse transcribed and amplified in the form of DNA or maybe reverse transcribed into DNA and the DNA used as a template fortranscription to generate recombinant RNA. Any method that results inthe production of a sufficient quantity of nucleic acid to be hybridizedeffectively to the gene chip may be used.

In another format, cell lines that contain reporter gene fusions betweenthe open reading frame and/or the 3′ or 5′ regulatory regions of a genein Table 1 and any assayable fusion partner may be prepared. Numerousassayable fusion partners are known and readily available including thefirefly luciferase gene and the gene encoding chloramphenicolacetyltransferase (Alam et al., Anal Biochem 188:245-254, 1990). Celllines containing the reporter gene fusions are then exposed to the agentto be tested under appropriate conditions and time. Differentialexpression of the reporter gene between samples exposed to the agent andcontrol samples identifies agents which modulate the expression of thenucleic acid.

In another assay format, cells or cell lines are first identified whichexpress one or more of the gene products of the inventionphysiologically. Cells and/or cell lines so identified would preferablycomprise the necessary cellular machinery to ensure that thetranscriptional and/or translational apparatus of the cells wouldfaithfully mimic the response of normal or cancerous stomach tissue toan exogenous agent. Such machinery would likely include appropriatesurface transduction mechanisms and/or cytosolic factors. Such celllines may be, but are not required to be, derived from stomach tissue.The cells and/or cell lines may then be contacted with an agent and theexpression of one or more of the genes of interest may then be assayed.The genes may be assayed at the mRNA level and/or at the protein level.

In some embodiments, such cells or cell lines may be transduced ortransfected with an expression vehicle (e.g., a plasmid or viral vector)containing an expression construct comprising an operable 5′-promotercontaining end of a gene of interest identified in Table 1 fused to oneor more nucleic acid sequences encoding one or more antigenic fragments.The construct may comprise all or a portion of the coding sequence ofthe gene of interest which may be positioned 5′- or 3′- to a sequenceencoding an antigenic fragment. The coding sequence of the gene ofinterest may be translated or un-translated after transcription of thegene fusion. At least one antigenic fragment may be translated. Theantigenic fragments are selected so that the fragments are under thetranscriptional control of the promoter of the gene of interest and areexpressed in a fashion substantially similar to the expression patternof the gene of interest. The antigenic fragments may be expressed aspolypeptides whose molecular weight can be distinguished from thenaturally occurring polypeptides.

In some embodiments, gene products of the invention may further comprisean immunologically distinct tag. Such a process is well known in the art(see Sambrook et al., supra). Cells or cell lines transduced ortransfected as outlined above are then contacted with agents underappropriate conditions; for example, the agent comprises apharmaceutically acceptable excipient and is contacted with cellscomprised in an aqueous physiological buffer such as phosphate bufferedsaline (PBS) at physiological pH, Eagles balanced salt solution (BSS) atphysiological pH, PBS or BSS comprising serum or conditioned mediacomprising PBS or BSS and serum incubated at 37° C. Said conditions maybe modulated as deemed necessary by one of skill in the art. Subsequentto contacting the cells with the agent, said cells will be disrupted andthe polypeptides of the lysate are fractionated such that a polypeptidefraction is pooled and contacted with an antibody to be furtherprocessed by immunological assay (e.g., ELISA, immunoprecipitation orWestern blot). The pool of proteins isolated from the “agent-contacted”sample will be compared with a control sample where only the excipientis contacted with the cells and an increase or decrease in theimmunologically generated signal from the “agent-contacted” samplecompared to the control will be used to distinguish the effectiveness ofthe agent.

Another embodiment of the present invention provides methods foridentifying agents that modulate the levels, concentration or at leastone activity of a protein(s) encoded by the genes in Table 1. Suchmethods or assays may utilize any means of monitoring or detecting thedesired activity.

In one format, the relative amounts of a protein of the inventionproduced in a cell population that has been exposed to the agent to betested may be compared to the amount produced in an un-exposed controlcell population. In this format, probes such as specific antibodies areused to monitor the differential expression of the protein in thedifferent cell populations. Cell lines or populations are exposed to theagent to be tested under appropriate conditions and time. Cellularlysates may be prepared from the exposed cell line or population and acontrol, unexposed cell line or population. The cellular lysates arethen analyzed with the probe, such as a specific antibody.

Probe Design

Probes based on the sequences of the genes described herein may beprepared by any commonly available method. Oligonucleotide probes forassaying the tissue or cell sample are preferably of sufficient lengthto specifically hybridize only to appropriate, complementary genes ortranscripts. Typically the oligonucleotide probes will be at least 10,12, 14, 16, 18, 20 or 25 nucleotides in length. In some cases longerprobes of at least 30, 40, or 50 nucleotides will be desirable.

One of skill in the art will appreciate that an enormous number of arraydesigns are suitable for the practice of this invention. The highdensity array will typically include a number of probes thatspecifically hybridize to the sequences of interest. See WO 99/32660 formethods of producing probes for a given gene or genes. In addition, in apreferred embodiment, the array will include one or more control probes.

High density array chips of the invention include “test probes.” Testprobes may be oligonucleotides that range from about 5 to about 500 orabout 5 to about 50 nucleotides, more preferably from about 10 to about40 nucleotides and most preferably from about 15 to about 40 nucleotidesin length. In other particularly preferred embodiments, the probes areabout 20 or 25 nucleotides in length. In another preferred embodiment,test probes are double or single strand DNA sequences. DNA sequences maybe isolated or cloned from natural sources or amplified from naturalsources using natural nucleic acid as templates. These probes havesequences complementary to particular subsequences of the genes whoseexpression they are designed to detect. Thus, the test probes arecapable of specifically hybridizing to the target nucleic acid they areto detect.

In addition to test probes that bind the target nucleic acid(s) ofinterest, the high density array can contain a number of control probes.The control probes fall into three categories referred to herein as (1)normalization controls; (2) expression level controls; and (3) mismatchcontrols.

Normalization controls are oligonucleotide or other nucleic acid probesthat are complementary to labeled reference oligonucleotides or othernucleic acid sequences that are added to the nucleic acid sample. Thesignals obtained from the normalization controls after hybridizationprovide a control for variations in hybridization conditions, labelintensity, “reading” efficiency and other factors that may cause thesignal of a perfect hybridization to vary between arrays. In a preferredembodiment, signals (e.g., fluorescence intensity) read from all otherprobes in the array are divided by the signal (e.g., fluorescenceintensity) from the control probes thereby normalizing the measurements.

Virtually any probe may serve as a normalization control. However, it isrecognized that hybridization efficiency varies with base compositionand probe length. Preferred normalization probes are selected to reflectthe average length of the other probes present in the array, however,they can be selected to cover a range of lengths. The normalizationcontrol(s) can also be selected to reflect the (average) basecomposition of the other probes in the array, however in a preferredembodiment, only one or a few probes are used and they are selected suchthat they hybridize well (i.e., no secondary structure) and do not matchany target-specific probes.

Expression level controls are probes that hybridize specifically withconstitutively expressed genes in the biological sample. Virtually anyconstitutively expressed gene provides a suitable target for expressionlevel controls. Typical expression level control probes have sequencescomplementary to subsequences of constitutively expressed “housekeepinggenes” including, but not limited to the β-actin gene, the transferrinreceptor gene, the GAPDH gene, and the like.

Mismatch controls may also be provided for the probes to the targetgenes, for expression level controls or for normalization controls.Mismatch controls are oligonucleotide probes or other nucleic acidprobes identical to their corresponding test or control probes exceptfor the presence of one or more mismatched bases. A mismatched base is abase selected so that it is not complementary to the corresponding basein the target sequence to which the probe would otherwise specificallyhybridize. One or more mismatches are selected such that underappropriate hybridization conditions (e.g., stringent conditions) thetest or control probe would be expected to hybridize with its targetsequence, but the mismatch probe would not hybridize (or would hybridizeto a significantly lesser extent). Preferred mismatch probes contain acentral mismatch. Thus, for example, where a probe is a twenty-mer, acorresponding mismatch probe may have the identical sequence except fora single base mismatch (e.g., substituting a G, a C or a T for an A) atany of positions 6 through 14 (the central mismatch).

Mismatch probes thus provide a control for non-specific binding or crosshybridization to a nucleic acid in the sample other than the target towhich the probe is directed. Mismatch probes also indicate whether ahybridization is specific or not. For example, if the target is presentthe perfect match probes should be consistently brighter than themismatch probes. In addition, if all central mismatches are present, themismatch probes can be used to detect a mutation. The difference inintensity between the perfect match and the mismatch probe (I(PM)-I(MM))provides a good measure of the concentration of the hybridized material.

Nucleic Acid Samples

As is apparent to one of ordinary skill in the art, nucleic acid samplesused in the methods and assays of the invention may be prepared by anyavailable method or process. Methods of isolating total mRNA are alsowell known to those of skill in the art. For example, methods ofisolation and purification of nucleic acids are described in detail inChapter 3 of Laboratory Techniques in Biochemistry and Molecular BiologyVol. 24, Hybridization With Nucleic Acid Probes: Theory and Nucleic AcidProbes, P. Tijssen (ed.) Elsevier Press, New York, 1993. Such samplesinclude RNA samples, but also include cDNA synthesized from a mRNAsample isolated from a cell or tissue of interest. Such samples alsoinclude DNA amplified from the cDNA, and an RNA transcribed from theamplified DNA. One of skill in the art would appreciate that it may bedesirable to inhibit or destroy RNase present in homogenates beforehomogenates can be used.

Biological samples may be of any biological tissue or fluid or cellsfrom any organism as well as cells raised in vitro, such as cell linesand tissue culture cells. Frequently the sample will be a “clinicalsample” which is a sample derived from a patient. Typical clinicalsamples include, but are not limited to, stomach tissue biopsy, sputum,blood, blood-cells (e.g., white cells), tissue or fine needle biopsysamples, urine, peritoneal fluid, and pleural fluid, or cells therefrom.Biological samples may also include sections of tissues, such as frozensections or formalin fixed sections taken for histological purposes.

Solid Supports

Solid supports containing oligonucleotide probes for differentiallyexpressed genes can be any solid or semisolid support material known tothose skilled in the art. Suitable examples include, but are not limitedto, membranes, filters, tissue culture dishes, polyvinyl chloridedishes, beads, test strips, silicon or glass based chips and the like.Suitable glass wafers and hybridization methods are widely available,for example, those disclosed by Beattie (WO 95/11755). Any solid surfaceto which oligonucleotides can be bound, either directly or indirectly,either covalently or non-covalently, can be used. In some embodiments,it may be desirable to attach some oligonucleotides covalently andothers non-covalently to the same solid support.

A preferred solid support is a high density array or DNA chip. Thesecontain a particular oligonucleotide probe in a predetermined locationon the array. Each predetermined location may contain more than onemolecule of the probe, but each molecule within the predeterminedlocation has an identical sequence. Such predetermined locations aretermed features. There may be, for example, from 2, 10, 100, 1000 to10,000, 100,000 or 400,000 of such features on a single solid support.The solid support, or the area within which the probes are attached maybe on the order of a square centimeter.

Oligonucleotide probe arrays for expression monitoring can be made andused according to any techniques known in the art (see for example,Lockhart et al., Nat Biotechnol 14:1675-1680, 1996; McGall et al, ProcNat Acad Sci USA 93: 13555-13460, 1996). Such probe arrays may containat least two or more oligonucleotides that are complementary to orhybridize to two or more of the genes described herein. Such arrays myalso contain oligonucleotides that are complementary or hybridize to atleast 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 70 or more the genesdescribed herein.

Methods of forming high density arrays of oligonucleotides with aminimal number of synthetic steps are known. The oligonucleotideanalogue array can be synthesized on a solid substrate by a variety ofmethods, including, but not limited to, light-directed chemicalcoupling, and mechanically directed coupling (see Pirrung et al., (1992)U.S. Pat. No. 5,143, 854; Fodor et al., (1998) U.S. Pat. No. 5,800,992;Chee et al., (1998) U.S. Pat. No. 5,837,832).

In brief, the light-directed combinatorial synthesis of oligonucleotidearrays on a glass surface proceeds using automated phosphoramiditechemistry and chip masking techniques. In one specific implementation, aglass surface is derivatized with a silane reagent containing afunctional group, e.g., a hydroxyl or amine group blocked by aphotolabile protecting group. Photolysis through a photolithogaphic maskis used selectively to expose functional groups which are then ready toreact with incoming 5′ photoprotected nucleoside phosphoramidites. Thephosphoramidites react only with those sites which are illuminated (andthus exposed by removal of the photolabile blocking group). Thus, thephosphoramidites only add to those areas selectively exposed from the.preceding step. These steps are repeated until the desired array ofsequences have been synthesized on the solid surface. Combinatorialsynthesis of different oligonucleotide analogues at different locationson the array is determined by the pattern of illumination duringsynthesis and the order of addition of coupling reagents.

In addition to the foregoing, additional methods which can be used togenerate an array of oligonucleotides on a single substrate aredescribed in Fodor et al. WO 93/09668. High density nucleic acid arrayscan also be fabricated by depositing pre-made or natural nucleic acidsin predetermined positions. Synthesized or natural nucleic acids aredeposited on specific locations of a substrate by light directedtargeting and oligonucleotide directed targeting. Another embodimentuses a dispenser that moves from region to region to deposit nucleicacids in specific spots.

Hybridization

Nucleic acid hybridization simply involves contacting a probe and targetnucleic acid under conditions where the probe and its complementarytarget can form stable hybrid duplexes through complementary basepairing (see Lockhart et al., (1999) WO 99/32660). The nucleic acidsthat do not form hybrid duplexes are then washed away leaving thehybridized nucleic acids to be detected, typically through detection ofan attached detectable label. It is generally recognized that nucleicacids are denatured by increasing the temperature or decreasing the saltconcentration of the buffer containing the nucleic acids. Under lowstringency conditions (e.g., low temperature and/or high salt) hybridduplexes (e.g., DNA-DNA, RNA-RNA or RNA-DNA) will form even where theannealed sequences are not perfectly complementary. Thus, specificity ofhybridization is reduced at lower stringency. Conversely, at higherstringency (e.g., higher temperature or lower salt) successfulhybridization requires fewer mismatches. One of skill in the art willappreciate that hybridization conditions may be selected to provide anydegree of stringency. In a preferred embodiment, hybridization isperformed at low stringency, in this case in 6× SSPE-T at 37° C. (0.005%Triton x-100) to ensure hybridization and then subsequent washes areperformed at higher stringency (e.g., 1× SSPE-T at 37° C.) to eliminatemismatched hybrid duplexes. Successive washes may be performed atincreasingly higher stringency (e.g., down to as low as 0.25× SSPET at37° C. to 50° C.) until a desired level of hybridization specificity isobtained. Stringency can also be increased by addition of agents such asformamide. Hybridization specificity may be evaluated by comparison ofhybridization to the test probes with hybridization to the variouscontrols that can be present (e.g., expression level control,normalization control, mismatch controls, etc.).

In general, there is a tradeoff between hybridization specificity(stringency) and signal intensity. Thus, in a preferred embodiment, thewash is performed at the highest stringency that produces consistentresults and that provides a signal intensity greater than approximately10% of the background intensity. Thus, in a preferred embodiment, thehybridized array may be washed at successively higher stringencysolutions and read between each wash. Analysis of the data sets thusproduced will reveal a wash stringency above which the hybridizationpattern is not appreciably altered and which provides adequate signalfor the particular oligonucleotide probes of interest.

Signal Detection

The hybridized nucleic acids are typically detected by detecting one ormore labels attached to the sample nucleic acids. The labels may beincorporated by any of a number of means well known to those of skill inthe art (see Lockhart et al., (1999) WO 99/32660).

Databases

The present invention includes relational databases containing sequenceinformation, for instance for one or more of the genes of Table 1, aswell as gene expression information in various stomach tissue samples.Databases may also contain information associated with a given sequenceor tissue sample such as descriptive information about the geneassociated with the sequence information, descriptive informationconcerning the clinical status of the tissue sample, or informationconcerning the patient from which the sample was derived. The databasemay be designed to include different parts, for instance a sequencedatabase and a gene expression database. The databases of the inventionmay be stored on any available computer-readable medium. Methods for theconfiguration and construction of such databases are widely available,for instance, see Akerblom et al., (U.S. Pat. No. 5,953,727), which isspecifically incorporated herein by reference in its entirety.

The databases of the invention may be linked to an outside or externaldatabase. In a preferred embodiment, as described in Table 1, theexternal database is GenBank and the associated databases maintained bythe National Center for Biotechnology Information or NCBI(http://www.ncbi.nlm.nih.gov/Entrez/). Other external databases that maybe used in the invention include those provided by Chemical AbstractsService (http://stnweb.cas.org/) or Incyte Genomics(http://www.incyte.com/sequence/index.shtml).

Any appropriate computer platform may be used to perform the necessarycomparisons between sequence information, gene expression informationand any other information in the database or provided as an input. Forexample, a large number of computer workstations are available from avariety of manufacturers, such has those available from SiliconGraphics. Client-server environments, database servers and networks arealso widely available and appropriate platforms for the databases of theinvention.

The databases of the invention may be used to produce, among otherthings, electronic Northern blots (E-Northerns) to allow the user todetermine the cell type or tissue in which a given gene is expressed andto allow determination of the abundance or expression level of a givengene in a particular tissue or cell. The E-northern analysis can be usedas a tool to discover tissue specific candidate therapeutic targets thatare not over-expressed in tissues such as the liver, kidney, or heart.These tissue types often lead to detrimental side effects once drugs aredeveloped and a first-pass screen to eliminate these targets early inthe target discovery and validation process would be beneficial.

The databases of the invention may also be used to present informationidentifying the expression level in a tissue or cell of a set of genescomprising at least one gene in Table 1, comprising the step ofcomparing the expression level of at least one gene in Table 1 in thetissue to the level of expression of the gene in the database. Suchmethods may be used to predict the physiological state of a given tissueby comparing the level of expression of a gene or genes in Table 1 froma sample to the expression levels found in tissue from normal stomachtissue, tissue from stomach tumors or both. Such methods may also beused in the drug or agent screening assays as described herein.

Without further description, it is believed that one of ordinary skillin the art can, using the preceding description and the followingillustrative examples, make and utilize the compounds of the presentinvention and practice the claimed methods. The preceding workingexamples therefore, are illustrative only and should not be construed aslimiting in any way the scope of the invention.

EXAMPLES Example 1 Preparation of Stomach Cancer Profiles

Tissue Sample Acquisition and Preparation

The patient tissue samples were derived from five Korean patients, fourmen and one woman, aged 47 to 68, who had been diagnosed with advancedgastric cancer (AGC). For each patient, tissue was obtained from twoareas of the stomach to produce a set of biopsy samples. Tissue wasremoved from a gastric tumor and from the non-cancerous surrounding areacomposed of normal stomach tissue (NOR).

Histological analysis of each of the tissue samples was performed andsamples were segregated into either normal (NOR) or cancerous (AGC)categories.

With minor modifications, the sample preparation protocol followed theAffymetrix GeneChip Expression Analysis Manual. Frozen tissue was firstground to powder using the Spex Certiprep 6800 Freezer Mill. Total RNAwas then extracted using Trizol (Life Technologies). The total RNA yieldfor each sample (average tissue weight of 300 mg) was 200-500 μg. Next,mRNA was isolated using the Oligotex mRNA Midi kit (Qiagen). Since themRNA was eluted in a final volume of 400 μl, an ethanol precipitationstep was required to bring the concentration to 1 μg/μl. Using 1-5 μg ofmRNA, double stranded cDNA was created using the SuperScript Choicesystem (Gibco-BRL). First strand cDNA synthesis was primed with aT7-(dT₂₄) oligonucleotide. The cDNA was then phenol-chloroform extractedand ethanol precipitated to a final concentration of 1 μg/μl.

From 2 μg of cDNA, cRNA was synthesized according to standardprocedures. To biotin label the cRNA, nucleotides Bio-11-CTP andBio-16-UTP (Enzo Diagnostics) were added to the reaction. After a 37° C.incubation for six hours, the labeled cRNA was cleaned up according tothe Rneasy Mini kit protocol (Qiagen). The cRNA was then fragmented(5×fragmentation buffer: 200 mM Tris-Acetate (pH 8.1), 500 mM KOAc, 150mM MgOAc) for thirty-five minutes at 94° C.

55 μg of fragmented cRNA was hybridized on the human and the HumanGenome U95 set of arrays for twenty-four hours at 60 rpm in a 45° C.hybridization oven. The chips were washed and stained with StreptavidinPhycoerythrin (SAPE) (Molecular Probes) in Affymetrix fluidics stations.To amplify staining, SAPE solution was added twice with ananti-streptavidin biotinylated antibody (Vector Laboratories) stainingstep in between. Hybridization to the probe arrays was detected byfluorometric scanning (Hewlett Packard Gene Array Scanner). Followinghybridization and scanning, the microarray images were analyzed forquality control, looking for major chip defects or abnormalities inhybridization signal. After all chips passed QC, the data was analyzedusing Affymetrix GeneChip software (v3.0), and Experimental Data MiningTool (EDMT) software (v1.0).

Gene Expression Analysis

All samples were prepared as described and hybridized onto theAffymetrix Human Genome U95 array. Each chip contains 16-20oligonucleotide probe pairs per gene or cDNA clone. These probe pairsinclude perfectly matched sets and mismatched sets, both of which arenecessary for the calculation of the average difference. The averagedifference is a measure of the intensity difference for each probe pair,calculated by subtracting the intensity of the mismatch from theintensity of the perfect match. This takes into considerationvariability in hybridization among probe pairs and other hybridizationartifacts that could affect the fluorescence intensities. Using theaverage difference value that has been calculated, an absolute call foreach gene is made.

The absolute call of present, absent or marginal is used to generate aGene Signature, a tool used to identify those genes that are commonlypresent or commonly absent in a given sample set, according to theabsolute call.

The Gene Signature Curve is a graphic view of the number of genesconsistently present in a given set of samples as the sample sizeincreases, taking into account the genes commonly expressed among aparticular set of samples, and discounting those genes whose expressionis variable among those samples. The curve is also indicative of thenumber of samples necessary to generate an accurate Gene Signature. Asthe sample number increases, the number of genes common to the sampleset decreases. The curve is generated using the positive Gene Signaturesof the samples in question, determined by adding one sample at a time tothe Gene Signature, beginning with the sample with the smallest numberof present genes and adding samples in ascending order. The curvedisplays the sample size required for the most consistency and the leastamount of expression variability from sample to sample. The point wherethis curve begins to level off represents the minimum number of samplesrequired for the Gene Signature. Graphed on the x-axis is the number ofsamples in the set, and on the y-axis is the number of genes in thepositive Gene Signature. As a general rule, the acceptable percent ofvariability in the number of positive genes between two sample setsshould be less than 5%.

For the purposes of this study, the following statistical methods wereused for the data analysis. A gene set consists of genes that have acertain percentage of present calls in at least one group of samples.These genes are analyzed, and others are excluded. For example, a genehaving 40% present calls (2 out of 5 samples) in at least in one samplegroup, cancerous stomach cells or non-cancerous stomach cells, isincluded in the analysis if 40% is above the lower limit for percentpresent calls. Also, the genes are divided into two groups depending ontheir expression values across samples. For the genes in the highexpression group, the average difference value is transformed to logscale before the analysis. For the genes in the low expression group,the original values are used in the analysis. An Analysis of Variance(ANOVA) method is used for data analysis (Steel et al., Principles andProcedures of Statistics: A Biometrical Approach, Third Ed.,McGraw-Hill, 1997). Prior to the final analysis, a leave-one-outapproach is used for outlier detection. One sample is left out of theANOVA analysis to see whether omitting a specific sample from theanalysis has any significant effect on the final result. If so, thatparticular sample is excluded from the final analysis. After outlierdetection, the final analysis produces a list of genes that aredifferentially expressed with a p-value ≦0.001 as determined by thecontrast from the ANOVA.

Differentially expressed genes were discovered by comparing biopsysamples from different regions of the same stomach in patients withadvanced stomach cancer (advanced gastric cancer). Gene expressionlevels in a patient's stomach tumor cells were compared to those in thepatient's normal stomach cells (AGC samples vs. NOR samples). Geneswhich showed no difference in expression level between a diseased stateand the normal control were not included in Table 1. Table 1 (33 genes)lists the genes that were found to be differentially expressed when thelevel in cancerous stomach cells was compared to the level in normalstomach cells.

Fold Change analysis

The data was first filtered to exclude all genes that showed noexpression in any of the samples. The ratio (cancerous/normal) wascalculated by comparing the mean expression value for each gene in acancerous sample set against the mean expression value of that gene inthe normal tissue sample set. Genes were included in the analysis ifthey had a fold change ≧1.9 in either direction, and a p-value <0.00097as determined by an Analysis of Variance Test (ANOVA). Out of the˜60,000 genes surveyed by the Human Genome U95 set, 33 genes werepresent in the overall fold change analysis. In Table 1, numbersrepresenting a comparison, or fold change, between the level ofexpression of a gene in disease state versus normal biopsy samples canbe positive or negative. Positive values indicate a higher expressionlevel in the tumor sample compared to the control (up-regulation), whilenegative values indicate a lower expression level in the tumor sample(down-regulation).

Expression Profiles of Genes Differentially Expressed in Stomach Cancer

Using the above described methods, genes that were predominantlyover-expressed in stomach cancer, or predominantly under-expressed instomach cancer, were identified. Genes with consistent differentialexpression patterns provide potential targets for broad rangediagnostics and therapeutics.

Table 1 lists the set of genes determined to be differentially expressedin cancerous stomach tissue compared to normal stomach tissue, with thefold change value for each gene. This set of genes, along with therelative expression levels of each gene, creates a profile for thedisease examined. A profile produced by a subset of these genes can alsobe of diagnostic or prognostic value, if examination of a patient'sstomach biopsy sample shows up-regulation and/or down-regulation of asubset of the genes in Table 1. Likewise, in addition to theirdiagnostic and monitoring uses, a subset of the genes of Table 1 can beused to screen therapeutic agents or used in pharmaceutical compositionsas therapeutic agents.

These genes or subsets of the genes of Table 1 confirm an overallstomach cancer gene expression profile. The genes in Table 1 may be usedalone, or in combination with the methods, compositions, databases andcomputer systems of the invention.

Although the present invention has been described in detail withreference to examples above, it is understood that various modificationscan be made without departing from the spirit of the invention.Accordingly, the invention is limited only by the following claims. Allcited patents and publications referred to in this application areherein incorporated by reference in their entirety. TABLE 1 GenesDifferentially Expressed in Stomach Cancer Fragment Accession Name Seq.ID Number UniGene ID Gene Symbol 89128_at 1 AI968491 Hs.236516 MINCLE80846_at 2 AI198352 Hs.7165 ZNF259 51709_at 3 A1925439 Hs.194694 MAP3K634345_at 4 AF026031 Hs.31334 TOM 57342_at 5 AA524258 Hs.279862 TOK-140076_at 6 AF004430 Hs.154718 TPD52L2 45723_at 7 AI805297 Hs.24135DKFZp761C241 32849_at 8 D80000 Hs.211602 SMC1L1 43809_at 9 AI984100Hs.178761 POH1 40036_at 10 AF035940 Hs.57904 MAGOH 65586_at 11 AI951998Hs.65588 DAZAP1 38964_r_at 12 U12707 Hs.2157 WAS 37700_at 13 X92106Hs.78943 BLMH 36968_s_at 14 AL050353 Hs.274170 OIP2 31879_at 15 U69127Hs.153636 FUBP3 37766_s_at 16 AF035309 Hs.79387 PSMC5 33150_at 17AI126004 Hs.322901 SAS10 49015_at 18 AA149864 Hs.22393 DENR 37490_at 19L27213 Hs.1176 SLC4A3 34845_at 20 AL035398 Hs.4877 CGI-51 37189_at 21AL023553 Hs.75835 PMM1 34986_at 22 AF030455 Hs.116651 EVA1 912_s_at 23M21056 Hs.992 PLA2G1B 56491_at 24 AL079368 Hs.44017 SIR2L 41071_at 25X57655 Hs.98243 SPINK2 37874_at 26 Z47553 Hs.14286 FMO5 34506_at 27M13928 Hs.1227 ALAD 47490_at 28 AL123839 Hs.132957 LOC64102 32177_s_at29 AC004084 Hs.184367 GAPL 35925_at 30 AF040639 Hs.284236 AKR7A332570_at 31 L76465 Hs.77348 HPGD 34070_s_at 32 Z84717 Hs.306284 PDI926_at 33 J03910 Hs.334409 MT1G Fragment Name Description 89128_atC-type (calcium dependent, carbohydrate-recognition domain) lectin,superfamily member 9 80846_at zinc finger protein 259 51709_atmitogen-activated protein kinase kinase kinase 6 34345_at putativemitochondrial outer membrane protein import receptor 57342_at cdkinhibitor p21 binding protein 40076_at tumor protein D52-like 2 45723_attransmembrane protein vezatin; hypothetical protein DKFZp761C24132849_at SMC1 (structural maintenance of chromosomes 1, yeast)- like 143809_at 26S proteasome-associated pad1 homolog 40036_at mago-nashi(Drosophila) homolog, proliferafion-associated 65586_at DAZ associatedprotein 1 38964_r_at Wiskott-Aldrich syndrome (eczema-thrombocytopenia)37700_at bleomycin hydrolase 36968_s_at Opa-interacdng protein 231879_at far upstream element (FUSE) binding protein 3 37766_s_atprotease (prosome, macropain) 26S subunit, ATPase 5,proteasome (prosome,macropain) 26S subunit, ATPase, 5 33150_at disrupter of silencing 1049015_at density-regulated protein 37490_at solute carrier family 4,anion exchanger, member 3 34845_at CGI-51 protein 37189_atphosphomannomutase 1 34986_at epithelial V-like antigen 1 912_s_atphospholipase A2, group IB (pancreas) 56491_at sirtuin (silent matingtype information regulation 2, S. cerevisiae, homolog) 2 41071_at serineprotease inhibitor, Kazal type, 2 (acrosin- trypsin inhibitor) 37874_atflavin containing monooxygenase 5 34506_at aminolevulinate, delta-,dehydratase 47490_at tenomodulin protein 32177_s_at GTPase activatingprotein-like 35925_at aldo-keto reductase family 7, member A3 (aflatoxinaldehyde reductase) 32570_at hydroxyprostaglandin dehydrogenase 15-(NAD)34070_s_at protein disulfide isomerase 926_at metallothionein 1GFragment Mean- Name AGC/NOR P-value Normals Mean-AGC 89128_at 7.650.000198 39.95 305.49 80846_at 6.78 0.000506 27.64 187.33 51709_at 6.430.000577 20 128.69 34345_at 3.77 0.000591 44.96 169.39 57342_at 3.410.000375 28.73 98.11 40076_at 3.03 0.000486 92.39 279.87 45723_at 2.920.000064 24.63 71.87 32849_at 2.84 0.000775 33.5 95.24 43809_at 2.640.00065 239.5 632.92 40036_at 2.50 0.000579 22.25 55.64 65586_at 2.450.000854 89.46 219.17 38964_r_at 2.41 0.000717 20 48.2 37700_at 2.330.000087 37.66 87.74 36968_s_at 2.22 0.000836 40.98 90.92 31879_at 2.180.000029 35.44 77.41 37766_s_at 2.18 0.00031 199.93 435.96 33150_at 2.110.000238 45.87 96.67 49015_at 2.10 0.000511 424.76 893.38 37490_at −1.850.000932 176.36 95.17 34845_at −1.96 0.000969 316.63 161.32 37189_at−2.54 0.000229 359.08 141.13 34986_at −2.58 0.000147 62.22 24.08912_s_at −2.91 0.000877 64.97 22.31 56491_at −2.92 0.00066 380.98 130.3141071_at −3.19 0.000528 84.55 26.51 37874_at −3.62 0.000424 113.64 31.3534506_at −3.72 0.000335 99.7 26.79 47490_at −3.72 0.000877 109.5 29.4132177_s_at −4.10 0.00098 158.6 38.72 35925_at −5.21 0.000349 268.5551.52 32570_at −7.29 0.000098 601.07 82.43 34070_s_at −7.86 0.000805191.26 24.33 926_at −16.09 0.000467 2347.13 145.83

TABLE 2 Patient Information Donor Donor Donor Age Date of Organ/ TissueNormal or Specimen Sample ID Gender Race at Excision Collection FluidSite Diseased Diagnosis YUMC-009-01 Male Korean 54 Jan. 16, 2001 Stomachantrum Normal NL, stomach YUMC-009-02 Male Korean 54 Jan. 16, 2001Stomach antrum Malignant adenoca, moderate(AGC) YUMC-048-01 FemaleKorean 65 Jan. 10, 2001 Stomach body, cardia Normal normal, stomachYUMC-048-02 Female Korean 65 Jan. 10, 2001 Stomach body, cardiaMalignant advanced gastric cancer YUMC-050-01 Male Korean 62 Apr. 23,2001 Stomach UB Normal normal, stomach YUMC-050-02 Male Korean 62 Apr.23, 2001 Stomach UB Malignant advanced gastric cancer YUMC-053-01 MaleKorean 68 Mar. 30, 2001 Stomach body Normal normal, stomach YUMC-053-02Male Korean 68 Mar. 30, 2001 Stomach body Malignant advanced gastriccancer YUMC-057-01 Male Korean 47 Mar. 7, 2001 Stomach body Normalnormal, stomach YUMC-057-02 Male Korean 47 Mar. 7, 2001 Stomach bodyMalignant advanced gastric cancer

1. A method of diagnosing stomach cancer in a patient, comprising: (a)detecting the level of expression in a tissue sample of one or moregenes from Table 1; wherein differential expression of the genes inTable 1 is indicative of stomach cancer.
 2. A method of detecting theprogression of stomach cancer in a patient, comprising: (a) detectingthe level of expression in a tissue sample of one or more genes fromTable 1; wherein differential expression of the genes in Table 1 isindicative of stomach cancer progression.
 3. A method of monitoring thetreatment of a patient with stomach cancer, comprising: (a)administering a pharmaceutical composition to the patient; (b) preparinga gene expression profile of one or more of the genes in Table 1 from acell or tissue sample from the patient; and (c) comparing the patientgene expression profile to a gene expression profile from a cellpopulation selected from the group consisting of normal stomach cellsand cancerous stomach cells.
 4. A method of treating a patient withstomach cancer, comprising: (a) administering to the patient apharmaceutical composition; (b) preparing a gene expression profile ofone or more of the genes in Table 1 from a cell or tissue sample fromthe patient; and (c) comparing the patient expression profile to a geneexpression profile selected from the group consisting of normal stomachcells and cancerous stomach cells.
 5. A method of typing stomach diseasein a patient, comprising: (a) detecting the level of expression in atissue sample of one or more genes from Table 1; wherein differentialexpression of the genes in Table 1 is indicative that the stomachdisease is stomach cancer.
 6. A method of screening for an agent capableof modulating the onset or progression of stomach cancer, comprising:(a) preparing a first gene expression profile of a cell populationcomprising cancerous stomach cells, wherein the expression profilecomprises the expression level of one or more genes from Table 1; (b)exposing the cell population to the agent; (c) preparing second geneexpression profile of the agent-exposed cell population; and (d)comparing the first and second gene expression profiles.
 7. Acomposition comprising at least two oligonucleotides, wherein each ofthe oligonucleotides comprises a sequence that specifically hybridizesto a gene in Table
 1. 8. A composition according to claim 7, wherein thecomposition comprises at least 3 oligonucleotides.
 9. A compositionaccording to claim 7, wherein the composition comprises at least 5oligonucleotides.
 10. A composition according to claim 7, wherein thecomposition comprises at least 7 oligonucleotides.
 11. A compositionaccording to claim 7, wherein the composition comprises at least 10oligonucleotides.
 12. A composition according to any one of claims 7,wherein the oligonucleotides are attached to a solid support.
 13. Acomposition according to claim 12, wherein the solid support is selectedfrom a group consisting of a membrane, a glass support, a filter, atissue culture dish, a polymeric material, a bead and a silica support.14. A solid support comprising at least two oligonucleotides, whereineach of the oligonucleotides comprises a sequence that specificallyhybridizes to a gene in Table
 1. 15. A solid support according to claim14, wherein the oligonucleotides are covalently attached to the solidsupport.
 16. A solid support according to claim 14, wherein theoligonucleotides are non-covalently attached to the solid support.
 17. Asolid support according to claim 14, wherein the support comprises atleast about 10 different oligonucleotides in discrete locations persquare centimeter.
 18. A solid support according to claim 14, whereinthe support comprises at least about 100 different oligonucleotides indiscrete locations per square centimeter.
 19. A solid support accordingto claim 14, wherein the support comprises at least about 1000 differentoligonucleotides in discrete locations per square centimeter.
 20. Asolid support according to claim 14, wherein the support comprises atleast about 10,000 different oligonucleotides in discrete locations persquare centimeter.
 21. A computer system comprising: (a) a databasecontaining information identifying the expression level in stomachtissue of a set of genes comprising at least one gene in Table 1; and(b) a user interface to view the information.
 22. A computer system ofclaim 21, wherein the database further comprises sequence informationfor the genes.
 23. A computer system of claim 21, wherein the databasefurther comprises information identifying the expression level for thegenes in normal stomach tissue.
 24. A computer system of claim 21,wherein the database further comprises information identifying theexpression level for the genes in tissue from a stomach tumor.
 25. Acomputer system of any of claims 21-24, further comprising recordsincluding descriptive information from an external database, whichinformation correlates said genes to records in the external database.26. A computer system of claim 25, wherein the external database isGenBank.
 27. A method of using a computer system of any one of claims21-24 to present information identifying the expression level in atissue or cell of at least one gene in Table 1, comprising: (a)comparing the expression level of at least one gene in Table 1 in thetissue or cell to the level of expression of the gene in the database.28. A method of claim 27, wherein the expression level of at least twogenes are compared.
 29. A method of claim 27, wherein the expressionlevel of at least five genes are compared.
 30. A method of claim 27,wherein the expression level of at least ten genes are compared.
 31. Amethod of claim 27, further comprising displaying the level ofexpression of at least one gene in the tissue or cell sample compared tothe expression level in stomach cancer.
 32. A therapeutic agent forslowing or halting the progression of stomach cancer, wherein the agentis selected from the group consisting of the genes in Table 1,functional fragments of the genes in Table 1, proteins encoded by thegenes in Table 1 and functional fragments of said proteins.
 33. A methodof treating a patient with stomach cancer, comprising: (a) administeringto a patient with stomach cancer a pharmaceutical composition comprisingall or a portion of at least one gene in Table 1, or a protein encodedtherein.